Report Assessment Ch2 Technology Visions

Data Management Report

Author
Affiliation

Rainer M. Krug

Doi
Abstract

A short description what this is about. This is not a tracditional abstract, but rather something else …

Working Title

IPBES_TCA_Ch2_technology

Code repo

Github - private

Build No: 39

%The BuidNo is automatically increased by one each time the report is rendered. It is used to indicate different renderings when the version stays the same%.

Introduction

All searches are done on all works in OpenAlex. The search in the TCA Corpus is not possibly at the moment, but we are working on it.

The following steps will be done in documented in this report:

Step 1: Determination of numbers

The search terms are based on the shared google doc. They are cleaned up for the usage in OpenAlex.

Vision

The search terms is vision

Show the code
#|

vision_count <- openalexR::oa_fetch(
    title_and_abstract.search = vision_st,
    count_only = TRUE,
    output = "list",
    verbose = TRUE
)$count

Technology

The search terms is technology

Show the code
#|

technology_count <- openalexR::oa_fetch(
    title_and_abstract.search = compact(technology_st),
    count_only = TRUE,
    output = "list",
    verbose = TRUE
)$count

Vision AND technology

The search term is vision AND technology

Count

Show the code
#|

vision_technology_count <-
    openalexR::oa_fetch(
        title_and_abstract.search = compact(paste0("(", vision_st, ") AND (", technology_st, ")")),
        output = "list",
        count_only = TRUE,
        verbose = TRUE
    )$count

Count Subfields

Show the code
#|

vision_technology_subfields <- openalexR::oa_query(
    title_and_abstract.search = compact(paste0("(", vision_st, ") AND (", technology_st, ")")),
    group_by = "primary_topic.subfield.id",
    verbose = TRUE
) |>
    openalexR::oa_request() |>
    dplyr::bind_rows() |>
    dplyr::arrange(key)

## clean up missing or wrong vision_technology_subfields$key_display_name
need_cleaning <- is.na(vision_technology_subfields$key_display_name) |
    !is.na(as.numeric(vision_technology_subfields$key_display_name))
Warning: NAs introduced by coercion
Show the code
fine <- !need_cleaning

vision_technology_subfields <- vision_technology_subfields |>
    dplyr::filter(fine) |>
    dplyr::select(key, key_display_name) |>
    dplyr::distinct() |>
    merge(y = vision_technology_subfields[need_cleaning, -2], by = "key") |>
    dplyr::bind_rows(vision_technology_subfields[fine, ]) |>
    dplyr::group_by(key, key_display_name) |>
    dplyr::summarize(count = sum(count))

Download technology AND vision Corpus

The corpus download will be stored in data/pages and the arrow database in data/corpus.

This is not on github!

The corpus can be read by running get_corpus() which o[pens the database so that then it can be fed into a dplyr pipeline. After most dplyr functions, the actual data needs to be collected via collect().

Only then is the actual data read!

Needs to be enabled by setting eval: true in the code block below.

Show the code
#|

tic()
pages_dir <- file.path(".", "data", "pages")

if (!dir.exists(fn)) {
    dir.create(
        path = pages_dir,
        showWarnings = FALSE,
        recursive = TRUE
    )

    years <- oa_fetch(
        title_and_abstract.search = compact(paste0("(", vision_st, ") AND (", technology_st, ")")),
        group_by = "publication_year",
        paging = "cursor",
        verbose = FALSE
    )$key

    #######
    #######
    processed <- list.dirs(
        path = pages_dir,
        full.names = FALSE,
        recursive = FALSE
    ) |>
        gsub(
            pattern = paste0("^pages_publication_year=", ""),
            replacement = ""
        )

    interrupted <- list.files(
        path = pages_dir,
        pattern = "^next_page.rds",
        full.names = TRUE,
        recursive = TRUE
    ) |>
        gsub(
            pattern = paste0("^", pages_dir, "/pages_publication_year=", ""),
            replacement = ""
        ) |>
        gsub(
            pattern = "/next_page.rds$",
            replacement = ""
        )

    completed <- processed[!(processed %in% interrupted)]

    years <- years[!(years %in% completed)]
    #######
    #######

    pbmcapply::pbmclapply(
        sample(years),
        function(y) {
            message("Getting data for year ", y, " ...")
            output_path <- file.path(pages_dir, paste0("pages_publication_year=", y))

            dir.create(
                path = output_path,
                showWarnings = FALSE,
                recursive = TRUE
            )

            data <- oa_query(
                title_and_abstract.search = compact(paste0("(", vision_st, ") AND (", technology_st, ")")),
                publication_year = y,
                options = list(
                    select = c("id", "doi", "authorships", "publication_year", "title", "abstract_inverted_index", "topics")
                ),
                verbose = FALSE
            ) |>
                IPBES.R::oa_request_IPBES(
                    count_only = FALSE,
                    output_path = output_path,
                    verbose = TRUE
                )
        },
        mc.cores = 8,
        mc.preschedule = FALSE
    )
}
toc()

The fields author and topics are serialized in the arrow database and need to be unserialized by using unserialize_arrow() on a dataset containing the two columns.

Show the code
tic()

pages_dir <- file.path(".", "data", "pages")
arrow_dir <- file.path(".", "data", "corpus")

years <- list.dirs(
    path = pages_dir,
    full.names = TRUE,
    recursive = FALSE
)

years_done <- list.dirs(
    path = arrow_dir,
    full.names = TRUE,
    recursive = FALSE
)

years <- years[
    !(
        gsub(
            x = years,
            pattern = paste0("^", pages_dir, "/pages_publication_year="),
            replacement = ""
        ) %in% gsub(
            x = years_done,
            pattern = paste0("^", arrow_dir, "/publication_year="),
            replacement = ""
        )
    )
]

pbapply::pblapply(
    sample(years),
    function(year) {
        message("\n     Processing year ", year, " ...\n")
        pages <- list.files(
            path = year,
            pattern = "^page_",
            full.names = TRUE,
            recursive = TRUE
        )
        data <- parallel::mclapply(
            pages,
            function(page) {
                p <- readRDS(file.path(page))$results |>
                    openalexR::works2df(verbose = FALSE)
                p$author_abbr <- IPBES.R::abbreviate_authors(p)
                return(p)
            },
            mc.cores = 3 # params$mc.cores
        ) |>
            do.call(what = rbind)

        saveRDS(
            data,
            file = file.path(paste0(year, ".rds"))
        )
        # unlink(year, recursive = TRUE)

        data <- serialize_arrow(data)

        # parquetize::write_parquet_at_once(
        #     data = data,
        #     path_to_parquet = arrow_dir,
        #     partition = "yes",
        #     partitioning = "publication_year"
        # )
        arrow::write_dataset(
            data,
            path = arrow_dir,
            partitioning = "publication_year",
            format = "parquet",
            existing_data_behavior = "overwrite"
        )
    }
)
toc()

Results

vision

Hits for search term vision: 105,460,634 hits

Individual terms cobmbined by OR:

Show the code
#|

assess_search_term(readLines(file.path("input", "vision.txt"))) |>
    dplyr::arrange(desc(count)) |>
    dplyr::mutate(count = formatC(count, format = "f", big.mark = ",", digits = 0)) |>
    knitr::kable()
term count
model 20,891,172
process 16,657,221
effect 13,615,349
approach 12,672,122
value 11,112,814
activity 9,840,219
performance 9,798,018
technique 9,152,900
influence 8,594,285
response 8,422,272
relationships 6,961,065
objective 6,730,111
solution 6,634,662
strategy 6,105,603
image 6,079,152
view 5,442,188
future 5,360,041
target 5,216,076
reaction 4,775,118
knowledge 4,733,730
project 4,290,558
project 4,290,558
policy 4,230,471
action 3,913,889
plan 3,907,385
operation 3,769,414
culture 3,628,666
perspective 3,383,505
task 2,796,993
effort 2,631,417
government 2,489,453
idea 2,252,953
opportunity 2,207,225
transmission 2,166,731
respect 2,070,271
perception 1,712,297
platform 1,636,234
existence 1,579,064
movement 1,512,521
scenarios 1,465,861
innovation 1,227,918
desire 1,136,393
visioning 1,110,170
vision 1,110,169
reality 976,453
story 948,854
conceptual 910,432
motivation 866,390
appearance 847,693
responsibilities 842,047
visualization 822,406
initiative 792,333
moment 786,653
hope 752,940
discourse 717,849
iteration 705,168
cooperation 681,807
mission 661,198
territory 459,789
intention 442,065
agenda 409,168
wish 404,610
dialogue 378,413
consultation 323,963
aspiration 299,694
fiction 292,857
spiritual 272,707
co-production 241,709
imagery 240,307
creativity 239,153
universe 230,193
dream 199,031
sight 191,072
imagination 169,255
inspiration 163,994
cosmology 159,474
harmony 120,164
coalition 110,297
self-determination 98,094
solidarity 92,333
fantasy 75,545
roadmap 67,543
worldview 66,179
reciprocity 59,800
ceremony 59,583
“collective action” 40,612
visionary 28,357
programm 26,392
foresight 23,186
“participatory process” 6,488
cosmovision 2,890
“deliberate process” 607
communit 449
cosmocentric 98
languague 83
arquetype 5

technology

Hits for search term technology: 14,910,126 hits

Individual terms cobmbined by OR:

Show the code
#|

assess_search_term(readLines(file.path("input", "technology.txt"))) |>
    dplyr::arrange(desc(count)) |>
    dplyr::mutate(count = formatC(count, format = "f", big.mark = ",", digits = 0)) |>
    knitr::kable()
term count
Technology 6,347,888
Software 2,464,826
Machine-to-Machine 2,126,672
Internet 1,244,253
“Social Media” 1,058,283
Virtualization 1,032,183
Robotics 773,127
“Machine Learning” 672,794
“Deep Learning” 392,779
“Artificial Intelligence” 323,808
“Renewable Energy” 255,296
“Biotechnology” 234,801
IOT 213,053
“Big Data” 191,308
“Internet of Things” 180,579
“Computer Vision” 129,313
“Virtual Reality” 124,149
“Cloud Computing” 121,492
E-commerce 118,516
Nanotechnology 100,731
5G 95,526
Blockchain 88,810
“Natural Language Processing” 79,654
“Augmented Reality” 67,453
“Speech Recognition” 66,117
“3D Printing” 65,304
“Smart Grid” 56,482
“Genetic Engineering” 41,845
“Genetic engineering” 41,845
“Autonomous Vehicle” 41,260
“Digital Transformation” 40,029
“Circular Economy” 37,747
Cybersecurity 35,519
“Clean Energy” 33,679
“Blockchain Technology” 32,128
“Data Science” 30,839
“Edge Computing” 30,136
“Cyber-Physical Systems” 27,826
“Smart Home” 26,299
“Quantum Computing” 25,546
“Digital Twin” 22,858
Cryptocurrency 22,105
“Space Technology” 17,484
Fintech 14,770
“Application Programming Interface” 13,409
“Mixed Reality” 11,706
“Facial Recognition” 9,935
“Wearable Technology” 8,120
Microservices 7,552
“Sustainable Technology” 7,052
“Digital Currency” 5,708
“Agile Development” 4,860
“Computational Technology” 3,661
DevOps 3,480
“Digital Wallet” 961
“Internet Safety” 923
“Internet Privacy” 742
“Digital Ethics” 478

vision AND technology

Hits for search term vision: 11,981,973 hits

Subfields

The subfields are based on the main topic assigned to each work. There are other topics also assigned, but this one has been identified as the main topic by an algorythm. count is the number of works in the vision AND technology corpus which have been assigned to the subfield.

Please take a look at these subfields of the topics to identify the ones to be filtered out.

The easies would be to download the Excel file through the button and to mark the subfields to be filtered out.

Show the code
IPBES.R::table_dt(vision_technology_subfields, fixedColumns = NULL, fn = "Vision Technology Subfields")

Reuse

Citation

BibTeX citation:
@report{krug,
  author = {Krug, Rainer M.},
  title = {Report {Assessment} {Ch2} {Technology} {Visions}},
  doi = {XXXXXX},
  langid = {en},
  abstract = {A short description what this is about. This is not a
    tracditional abstract, but rather something else ...}
}
For attribution, please cite this work as:
Krug, Rainer M. n.d. “Report Assessment Ch2 Technology Visions.” IPBES Data Management Report. https://doi.org/XXXXXX.